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Abstract 

Selection  and  ranking  problems  have  been  studied  over  the  last  thirty  years,  generally 
under  one  of  two  formulations:  Bechhofer’s  indifference  zone  approach  and  Gupta’s  subset 
selection  approach.  This  paper  deals  with  subset  selection.  Subset  selection  procedures  in 
multivariate  models  are  briefly  reviewed.  These  include:  (l)  Procedures  for  selecting  the 
best  component  in  a  multivariate  normal  population  in  terms  of  the  component  means 
as  well  as  the  component  variances  ^(Section -2);  (2)  Procedures  for  selecting  the  best 
from  several  multivariate  normal  populations  in  terms  ofvfi)  the  Mahalanobis  distance, 
l_(ii)-  the  generalized  variance,  and^m)'  the  multiple  correlation  coefficientffSection  3);  (3) 
Procedures  (fixed  sample  size  as  well  as  inverse  san^pling)  for  selecting  the  most  (least) 
probable  cell  in  a  multinomial  distributionifSection  4);  (4)  Procedures  for  selecting  the  best 
from  several  multinomial  populations  in  terms  of  the  Shannon  entropy  function;  (Section 
— 5)~,  and  (5)  Procedures  for  choosing  the  best  subset  of  the  predictor  variables  in  a  linear 
regression  model. (Section  6). 
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STATISTICAL  SELECTION  PROCEDURES  IN 
MULTIVARIATE  MODELS 


1.  INTRODUCTION 

Since  statistical  inference  problems  were  first  posed  in  the  now-familiar 
“selection  and  ranking”  framework  over  three  decades  ago,  these  prob¬ 
lems  have  been  studied  from  several  points  of  view  using  various  goals 
and  formulations.  However,  selection  from  multivariate  populations 
is  an  important  topic  that  has  not  been  adequately  studied  in  the 
literature.  Our  interest  here  is  to  briefly  review  developments  per¬ 
taining  selection  from  multivariate  models.  In  doing  so,  we  consider: 
(1)  selection  from  a  single  multivariate  normal  population,  (2)  selec¬ 
tion  from  several  multivariate  normal  populations,  (3)  selection  from 
a  multinomial  population,  (4)  selection  from  several  multinomial  pop¬ 
ulations,  and  (5)  selection  from  a  set  of  predictor  variables  in  a  re¬ 
gression  model. 

For  ranking  multivariate  populations,  usually  a  scalar  function 
of  the  unknown  parameters  has  been  chosen  in  all  the  investigations. 
This  permits  a  complete  order  of  the  populations.  The  choice  of 
the  ranking  measure  depends,  of  course,  on  the  specific  situations. 
The  selection  procedure  in  these  cases  depends  on  a  suitably  chosen 
statistic  which  has  a  univariate  distribution. 

Let  us  consider  k  independent  populations  jti,.  . .  ,  tt*,  where  7Tj 
has  the  underlying  distribution  function  F${,  i  =  1  The  0, 

are  unknown  real-valued  parameters;  these  represent  the  values  of 
a  certain  quality  characteristic  0  for  the  k  populations.  The  popu¬ 
lations  are  ranked  according  to  their  0-values.  To  be  specific,  tt,  is 
defined  to  be  better  than  7Ty  if  0j  >  0;.  The  ordered  0{  are  denoted 
by  0[i]  <  ...  <  0(fcj.  It  is  assumed  that  there  is  no  prior  knowledge 
regarding  the  correct  pairing  of  the  ordered  and  the  unordered  0,  .  Se¬ 
lection  problems  have  been  generally  studied  under  one  of  two  formu¬ 
lations,  namely,  (l)  the  indifference- zone  and  (2)  the  subset  selection 
formulations. 

Considering  the  basic  problem  of  selecting  the  best  population 
(i.e.  the  population  associated  with  0[fc]),  the  indifference-zone  for¬ 
mulation  of  Bechhofer  (1954)  requires  that  one  of  the  k  populations 
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be  chosen  as  the  best.  A  correct  selection  (CS)  is  said  to  occur  when 
any  population  associated  with  0^  is  selected.  Any  valid  procedure 
R  must  guarantee  a  specified  minimum  probability  of  a  correct  se¬ 
lection  (PCS)  whenever  the  best  and  the  next  best  populations  are 
sufficiently  (to  be  specified)  apart.  Let  6(0[k],  0[fc-i])  denote  an  ap¬ 
propriately  chosen  measure  of  the  separation  between  the  best  and 
the  next  best  populations,  and  P(CS\R)  denote  the  PCS  using  the 
rule  R.  Further,  let 

tv  =  {0\e  =  (0l,...,0k),6(0[k],0{k-i\)  >6*>  0}.  (1) 

Any  valid  rule  R  should  satisfy 

P(C5|«)  >  P*  whenever  6  efts- ■  (2) 

Both  6 *  and  P*e(l/k ,  1)  are  specified  by  the  experimenter  in  advance. 
Suppose  R  is  based  on  samples  of  size  n  from  each  population.  Then 
the  problem  is  to  determine  the  smallest  n  for  which  the  requirement 
(2)  is  satisfied.  It  should  be  noted  that  there  is  no  guarantee  to  be 
met  when  0  belongs  to  fig.,  the  complement  of  fig*.  The  region  Ucs. 
is  the  “indifference-zone”  lending  its  name  to  the  formulation. 

In  the  subset  selection  formulation  studied  extensively  beginning 
with  the  pioneering  work  of  Gupta  (1956,  1965),  the  basic  problem 
is  to  select  a  nonempty  subset  of  the  k  populations  so  that  the  best 
population  is  included  in  the  selected  subset  with  a  specified  minimum 
PCS.  The  size  of  5,  the  selected  subset,  is  not  determined  in  advance 
but  by  data  themselves.  Selection  of  any  subset  that  includes  the  best 
population  results  in  a  correct  selection.  Letting  fl  denote  the  entire 
parameter  space,  any  valid  rule  R  should  satisfy 

P{CS\R)  >  P*  for  all  0  c  n.  (3) 

This  requirement  (3)  is  called  the  basic  probability  requirement ,  or  the 
P*  -condition.  Any  configuration  0  which  yields  the  infimum  of  PCS 
over  17  is  called  a  least  favorable  configuration  (LFC). 

The  expected  value  of  |S|,  the  size  of  S,  is  a  reasonable  measure 
of  the  performance  of  a  valid  rule  that  has  been  generally  used.  Some 
other  possible  measures  (considered  by  a  few  authors)  are 
P(|S|)/P(CS|P)  and  P(|5|) -P(CS|P),  the  latter  being  the  expected 
number  of  non-best  populations  included  in  S. 

There  are  many  variations  and  generalizations  of  the  basic  for¬ 
mulation  using  either  of  the  two  approaches  described  above.  There 
are  also  related  problems  such  as  selecting  populations  that  are  better 
than  a  standard  or  a  control.  A  comprehensive  survey  of  the  develop- 


ments  encompassing  all  these  aspects  with  an  extensive  bibliography 
is  given  by  Gupta  and  Panchapakesan  (19791.  Recently,  Gupta  and 
Panchapakesan  (1985)  have  provided  a  critical  review  of  developments 
in  the  subset  selection  theory  with  historical  perspectives.  For  a  cat¬ 
egorized  bibliography,  see  Dudewicz  and  Koo  (1982). 

In  the  present  paper,  we  are  concerned  with  subset  selection 
procedures  for  multivariate  populations.  In  Section  2,  we  discuss  se¬ 
lection  of  the  best  component  in  a  multivariate  normal  population  in 
terms  of  the  means  as  well  as  the  variances.  Selection  from  several 
multivariate  normal  populations  is  discussed  in  Section  3  using  dif¬ 
ferent  criteria  such  as  the  Mahalanobis  distance,  the  generalized  vari¬ 
ance,  and  the  multiple  correlation  coefficient.  Section  4  deals  with 
selecting  the  most  probable  and  the  least  probable  cells  in  a  multino¬ 
mial  distribution.  Selection  from  several  multinomial  populations  is 
discussed  in  Section  5  using  the  Shannon  entropy  function  for  compar¬ 
ison  of  the  populations.  Finally,  Section  6  describes  subset  selection 
procedures  for  choosing  a  best  set  of  predictor  variables  in  a  linear 
regression  model. 


2.  SELECTION  FROM  A  SINGLE  MULTIVARIATE  NORMAL 
POPULATION 

Consider  a  p-variate  normal  population  iVp(/tx,  E)  with  mean  vector 
j u'  —  (pi, . . .  ,/ip)  and  covariance  matrix  £  =  (<r^),  which  is 

assumed  to  be  positive  definite.  In  this  section,  we  consider  ranking 
the  p  components  according  to  their  means  m,  and  according  to  their 
variances  on. 


2.1.  Selection  in  Terms  of  the  Means 

Let  X'  —  (Xi , . . . ,  Xp)  be  the  sample  mean  based  on  n  independent 
(vector)  observations  from  the  population.  We  first  consider  the  case 
of  known  £  and  assume,  without  loss  of  generality,  that  on  =  1  for 
i  =  l,...,p.  For  selecting  the  component  associated  with  /z[p] ,  the 
largest  /x,,  Gnanadesikan  (1966)  considered  the  procedure 
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where  Xjjj  <  . . .  <  X(p]  denote  the  ordered  X,,  and  d\  = 
di(n,p,  E)  >  0  is  the  smallest  number  such  that  the  P*-condition  is 
satisfied.  It  is  easily  shown  that 

infPlCSjiJO  =  Pr  {Yp  >  Y,  -  du  j  =  (5) 

where  Yt  =  y/n(X^)  -  /x [ t j ) ,  X ^  is  the  component  sample  mean 
associated  with  /z^j,  and  fi  =  {y  :  -00  <  /z,  <  00,  i  —  l,...,p}. 

For  evaluating  d\  for  which  the  right-hand  side  of  (5)  equals  P* ,  we 
need  to  know  A  =  (atJ),  the  covariance  matrix  of  Y'  =  (Yi,...,Yp). 
Even  though  E  is  known,  we  do  not  know  the  correspondence  between 
the  Oij  and  the  atJ  except  when  p  =  2.  For  p  —  2,  the  right-hand  side 

of  (5)  equals  $[di/ \/ 2(1  -  <712)],  where  $(•)  is  the  cdf  of  a  standard 
normal  random  variable;  this  gives 

dy  =  d,(n,2,E)  =  v/2(l  (6) 

For  p  >  2,  Gnanadesikan  (1966)  obtain  two  different  lower  bounds 
for  the  infimum  of  PCS.  Letting  doi  =  min{d]/\/2(l  —  aPJ),  j  ~ 
1,. . . ,  p  -  1},  one  gets 

infP(CS|/?,)  >  Pr{2r;  <  rfoi,  J  =  1 . .  1}  (7) 

where  Z'  -  [Z\ , . . . ,  Zr- 1)  has  Np-\(0, 13)  distribution  and  B  has  a 
known  structure  with  elements  being  0,  or  [2(1  -  aJP))  *,  or  -[2(1  - 
a;p))~?i  3  =  l,...,p  -  1.  One  lower  bound  for  the  right-hand  side 
of  (7)  obtained  by  Gnanadesikan  (1966)  is  Op_1(doi)  based  on  an 
inequality  due  to  Slepian  (1962).  Tne  other  lower  bound  is  (2  —  p)  + 
(p  —  l)3>(doi)  obtained  by  using  a  Bonferroni  inequality.  For  p  =  2, 
the  two  bounds  coincide.  While  dot,  using  either  lower  bound,  is  a 
conservative  value  for  dy,  the  computations  of  Gnanadesikan  (1966) 
show  that  do\  in  the  former  case  (Slepian  inequality)  is  closer  to  the 
exact  value.  However,  the  difference  between  the  two  approximate 
values  decreases  as  P"  increases  and  is  very  small  for  P*  >  .90. 

The  determination  of  the  constant  d  becomes  easier  when  = 
p  >  0,  1  /  j.  In  this  rase,  we  get 


V 

i 


ft 


inf  P(CS  /('ll 
u 


(*  1 


:)d*(z) 


(8) 


V.v’.'/Vb' 


and  H  =  dfyj 2(1  —  p)  are  tabulated  by  Gupta  (1963a)  and  by  Gupta, 
Nagel  and  Panchapakesan  (1973)  who  have  also  considered  the  selec¬ 
tion  problem  in  this  special  case. 

When  the  covariance  matrix  E  is  unknown,  let  us  assume  that 
o a  —  a2  for  i  =  1, . . .  ,p,  and  let  s2  denote  an  estimator  of  a2  on  u 
degrees  of  freedom,  statistically  independent  of  the  Xt.  In  this  case, 
Gnanadesikan  (1966)  proposed  the  procedure 


d2Su 


Select  the  ith  component  if  and  only  if  AT,  >  Xrpi - —  (9) 

y/n 


where  d2  =  d2{y,p,  P*)  >  0  is  the  smallest  number  for  which  the 
P* -condition  is  satisfied.  For  this  procedure, 

inf  P(CS\R2)  >  Pr{t,-  <  doi>  t '  =  1, . . .  ,p  —  1} 

p-i 

>  l-^Pr{t,  >doi}  (10) 

t=i 


where  t,  =  Z,/sv,  Z'  =  (Z i, . . . ,  Zp- 1)  has  the  same  distribution  as  in 
the  known  E  case,  i/s2 /o2  has  a  chi-square  distribution  with  v  degrees 
of  freedom,  d0l  is  defined  as  before,  and  fl  =  {(#,  E)}.  Equating  the 
last  member  of  the  inequalities  in  (10)  to  P* ,  an  approximate  value 
of  d0i  Is  given  by 

(2  —  p)  +  (p  —  l)G„(d0)  —  P*  (11) 

where  Gv{-)  is  the  cdf  of  a  Student’s  t  variable  with  v  degrees  of 
freedom.  In  the  special  case  of  otJ  =  pa2,  p  >  0,  doi  can  evaluated 
as  an  equicoordinate  percentage  point  of  a  multivariate  t  distribution. 
The  doi  values  are  tabulated  by  Gupta  and  Sobel  (1957),  Krishnaiah 
and  Armitage  (1966),  and  Gupta,  Panchapakesan  and  Sohn  (1985). 


2.2.  Selection  in  Terms  of  the  Variances 

We  now  define  the  best  component  as  the  one  associated  with  the 
smallest  <7,,-.  A  natural  procedure  is  analogous  to  that  of  Gupta  and 


Sobel  (1962a)  in  the  uncorrelated  case.  This  procedure  is 

i?3  :  Select  the  ith  component  if  st,  <  -  min  s,,  (12) 

ci  <i<p 

where  c  =  c(p,n,P *)  e  (0, 1)  is  the  largest  number  for  which  the  P'- 
condition  is  satisfied,  and  S  —  (s^y)  is  the  sample  covariance  matrix 
based  on  n  independent  (vector)  observations  from  the  population. 
This  procedure  has  been  considered  by  Frischtak  (1973),  who  has 
shown  that,  for  p  =  2,  the  infimum  of  PCS  is  attained  when  crj  j  =  cr2 2 
and  cr  12  =  0.  Thus  c  can  be  obtained  from  the  tables  of  Gupta  and 
Sobel  (1962b). 

For  p  >  3,  Frischtak  (1973)  obtained  only  an  asymptotic 
( n  — >  oo)  solution,  using  the  asymptotic  normality  of  log(s^>/s^), 

j  —  2, . . . ,  p,  after  suitable  normalization;  here  s ^  is  the  s„  associated 
with  the  ith  smallest  er,,.  The  asymptotic  solution  c  is  given  by 

Pr {Yj  <  c,  j  =  2,...,p}  =  P*  (13) 

where  the  Yj  are  standard  normal  random  variables  with  equal  corre¬ 
lation  0.5,  and  can  be  obtained  from  the  tables  of  Gupta  (1963a)  and 
Gupta,  Nagel  and  Panchapakesan  (1973). 


3.  SELECTION  FROM  SEVERAL  MULTIVARIATE  NORMAL 
POPULATIONS 

Let  7Ti,...,7 Tfc  be  k  p-variate  normal  populations,  Np{iii,  £,),i  = 
1  where  the  fz,  are  the  mean  vectors  and  the  E,  are  posi¬ 

tive  definite  covariance  matrices.  For  defining  the  best  population, 
several  measures  have  been  used  such  as  the  generalized  variance, 
Mahalanobis  distance,  and  the  multiple  correlation  coefficient.  Also, 
comparison  with  a  control  has  been  studied  using  as  criteria  linear 
combinations  of  the  elements  of  the  mean  vector  and  those  of  the 
covariance  matrix.  We  now  discuss  these  briefly. 


3.1.  Selection  in  Terms  of  Mahalanobis  Distance 

Let  A,  =  ^(Et_1,y,,  the  Mahalanobis  distance  of  TCi  from  the  origin.  We 
first  assume  that  the  E,  are  known.  Let  X,y,  j  =  1  denote  n 

(vector)  observations  from  7r,,  i  =  1  Define  Y{j  =  X(-  E^X,^ 

n 

and  Y,  =  F°r  selecting  a  subset  containing  the  population 

1=1 

associated  with  A^j,  Gupta  (1966)  proposed  the  procedure 

i?4  :  Select  7rt-  if  and  only  if  Yi  >  c4F[fc]  (14) 

where  0  <  c4  =  c4(k,p,n,  P*)  <  1  is  to  be  chosen  suitably  to  meet 
the  P*-condition.  It  has  been  shown  [Gupta  (1966)  and  Gupta  and 
Studden  ( 1970) J  that  the  infimum  of  PCS  occurs  when  Ai  =  ...  = 
A k  —  0.  Thus  the  constant  c4  is  given  by 

r°°  t 

/  Gk~l{-)dGl/{x)  =  P'  (15) 

JO 

where  Gl/(x)  is  the  cdf  of  a  standardized  (i.e  unit  scale  parameter) 
gamma  variable  with  u  =  npj 2  degrees  of  freedom.  The  values  of  c 
are  tabulated  by  Gupta  (1963b)  and  Armitage  and  Krishnaiah  (1964). 

An  analogous  procedure  can  be  defined  for  selecting  the  popula¬ 
tion  with  the  smallest  A,.  In  this  case,  the  appropriate  constant  can  be 
obtained  from  the  tables  of  Gupta  and  Sobel  (1962b)  and  Krishnaiah 
and  Armitage  (1964). 

It  should  be  noted  that  the  procedure  P4  is  based  on  the  statis¬ 
tics  Yi  =  X)  X'ijZr'Xij  rather  than  Z,  =  X[e ~1X,,  where  X,  de- 
y=i  ~ 

note  the  sample  mean  vector  from  7r,-.  If  we  use  Z,  instead  of  Y, 
in  fl4,  the  infimum  of  PCS  and  hence  the  constant  c4  do  not  de¬ 
pend  on  n.  This  makes  the  procedure  unsatisfactory.  One  can,  of 
course,  use  a  different  type  of  procedure.  For  example,  we  can  define 
R'  :  Select  7r,  if  and  only  if  Z,  >  —  d,  d  >  0.  Such  a  procedure 

has  not  been  investigated. 

When  the  E,  are  unknown  and  not  necessarily  equal,  Gupta  and 
Studden  (1970)  proposed  and  studied  the  rule 

P5  :  Select  /Ti  if  and  only  if  T,  >  csTjfc]  (16) 

where  Ti  =  x\ 1 X,,  Si  is  the  usual  sample  covariance  matrix  with 


(n  —  1)  as  the  divisor,  and  0  <  C5  =  cs(k,n,  p,  P')  <  1  is  chosen 
suitably  to  satisfy  the  P’-condition.  It  has  been  shown  by  Gupta  and 
Studden  (1970)  that 

fOO 

infP(CS|P5)  =  /  F^--P{~)dFp,n-P(x)  (17) 

n  j  0  c  5 

where  Fp  n_p(x)  is  the  cdf  of  a  central  F-variable  with  p  and  n  p 
degrees  of  freedom.  The  values  of  c5  for  which  the  right-hand  side 
of  (17)  equals  P*  have  been  tabulated  by  Gupta  and  Panchapakesan 
(1969)  for  various  values  of  k,  P',p,  and  n. 

Gupta  and  Studden  (1970)  also  studied  the  problem  of  selecting 
the  population  associated  with  the  smallest  A,.  Their  rule  is 

R'5  :  Select  7r,  if  and  only  if  T,  <  —  T\\]  (IS) 

c5 

where  0  <  c'5  —  c^[k,n,p,P")  <  1  is  to  be  chosen  suitably.  In  this 
case, 

fOO 

inf  P{CS  1P'5)  -  /  [1  -  Fp,n,-p{c'sx)\k-ldFp,n  p(x).  (19) 

n  Jo 

The  constant  c'5  for  which  the  right-hand  side  of  (19)  equals  P*  has 
been  tabulated  by  Gupta  and  Panchapakesan  (1969)  for  several  com¬ 
binations  of  k,P\p,  and  n. 

When  Ei  ...  E*  ---  E  and  E  is  unknown ,  one  would  define 

a  procedure  with  F,  =  XtS  " 1  Xx  in  P5,  where  S  is  the  usual  pooled 
estimator  of  E.  This  procedure  was  proposed  by  Gupta  and  Studden 
(1970)  and  studied  later  bv  Chattopadhyay  (1981).  lie  has  discussed 
evaluation  of  the  constant  in  an  approximate  sense,  i.e.  the  infimum 
of  PCS  is  approximately  /’*  but  can  be  on  either  side. 


3.2.  Selection  in  Terms  of  the  Generalized  Variance 

It  is  meaningful  to  rank  multivariate  normal  populations  according 
to  the  amounts  of  dispersion  in  them.  A  frequently  used  measure 
of  dispersion  is  the  generalized  variance  which  is  the  determinant  of 
the  covariance  matrix,  bet  f)t  |E,|,  1  1  ,...,k.  We  define  the 

best  population  as  the  one  associated  with  the  smallest  fft.  bet  S, 
be  the  sample  covariance  matrix  based  on  a  sample  of  size  n  from 


7 t,,  i  ~  1  ,...,k.  Gnanadesikan  and  Gupta  (1970)  proposed  the  rule 


R(l  :  Select  7r,  if  and  only  if  Wt  <  — Wi  j i 

Cr  '  ' 


(20) 


where  VV',  -  !5,  j,  and  0  <  cG  —  c6{k,n,  p,  P’)  <  1  is  to  be  chosen 

suitably  to  satisfy  the  P’-condition.  It  has  been  shown  that 

inf  P{CS\R6)  Pr{K,  <  —  Vy,  j  =  2 (21) 
W  C6 

where  Y\ , . . . ,  are  independent  and  identically  distributed,  each 
being  the  product  of  p  independent  factors,  the  rth  factor  having  a 
chi-square  distribution  with  ( n  —  r)  degrees  of  freedom.  An  exact 
solution  for  c<-,  is  obtained  in  the  case  of  p  —  2,  using  the  fact  that 
2(n  1  )v^2{Wtj6l)  5  is  then  distributed  as  a  chi-square  variable  with 

2 (n  2)  degrees  of  freedom.  The  constant  c6  in  this  case  can  be 
obtained  from  the  tables  of  Gupta  and  Sobel  (1962b)  and  Krishnaiah 
and  Armitage  (1964). 

When  p  >  2,  one  can  use  Hoel’s  approximation  of  the  distribu¬ 
tion  of  Yt'/p  by  a  gamma  distribution  with  scale  parameter  0_1  and 
shape  parameter  m,  where  2m  =  p(n  -  p)  and  20  =  p[  1  —  (2n)_1(p  - 
l)(p  2)]*/p.  Another  approximation  is  that  of  p~Mog  V,  using  the 
normal  approximation  of  log  x2-  Gnanadesikan  and  Gupta  (1970) 
have  studied  these  approximations. 

Some  alternative  procedures  have  been  proposed  by  Regier 
(1976).  These  procedures  are  R'6  :  Select  7r,  if  and  only  if  Wi  < 
k  k 

a(  [j  W})x/k  and  :  Select  if  and  only  if  W,  <  6  Wj/k. 

] - i  y=i 

Again,  the  evaluation  of  the  constants  a  and  6  are  based  on  normal 
approximation  to  log  x2  and  the  asymptotic  distribution  of  the  sam¬ 
ple  variance,  respectively.  Regier  (1976)  has  given  some  numerical 
comparisons  of  the  three  procedures. 


3.3.  Selection  in  Terms  of  Multiple  Correlation  Coefficient 

We  now  assume  that  the  /i,  and  £,  are  unknown.  Let  p,  denote  the 
multiple  correlation  coefficient  between  the  first  variable  and  the  rest 
in  7 r, .  It  is  a  measure  of  dependence  between  the  two  partitioned 
sets.  Gupta  and  Panchapakesan  (1969)  investigated  the  problem  of 
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selecting  a  subset  containing  the  population  associated  with  P|fc|(p[i])- 
Let  72<  denote  the  multiple  correlation  coefficient  between  the  first 
variable  and  the  rest  from  the  sample  Xl} ,  j  =  1  Two  cases 

arise:  (1)  the  conditional  case  in  which  the  variables  2  to  p  are  fixed, 
and  (2)  the  unconditional  case  in  which  all  variables  are  random.  Let 

72 *2  =  J?2/( i  _  /J2)  {  =  l ,  Gupta  and  Panchapakesan  (1969) 

proposed  the  rule 
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/?7  :  Select  7r,  if  and  only  if  72*  >  c 7 72 |  (22) 

for  selecting  the  population  associated  with  pj^j,  and  the  rule 

R'7  :  Select  7r,  if  and  only  if  72*  <  —  72j*j .  (23) 

C7 

for  selecting  the  population  associated  with  pj  1  j ,  where  0  <  C7  = 
c7(k,p,n  —  p,  P*)  <  1  and  0  <  c7  —  c7(k,p,n  -  p,  P*)  <  1  are  chosen 
suitably  to  meet  the  P*-condition.  The  procedures  proposed  are  the 
same  for  the  conditional  as2well  as  the  unconditional  case.  When 
Pi  0,  the  distribution  of  72*  is  diffc  *ent  in  these  two  cases.  However, 

the  infimum  of  PCS  occurs  in  either  case  when  pj  =  . . .  =  p*  =  0. 

2 

The  distribution  of  72*  is  the  same  in  either  case  when  p,  =  0.  Thus, 
in  either  case,  the  constants  c7  and  C7  are  given  by 

fOO 

/  ?i,:i =  p-  pi) 

JO  c7 

and 

r° o 

/  [1  -  F2q,2m(c,7x)]fc-,dF2,i2Tn(i)  =  P*  (25) 

Jo 

where  q  —  (p  —  l)/2,  m  —  (n  —  p) / 2,  and  F2<j,2m(i)  is  the  cdf  of 
an  P-variable  with  2 q  and  2m  degrees  of  freedom.  The  values  of  c7 
for  selected  values  of  k,P‘,m,  and  q  are  tabulated  by  Gupta  and 
Panchapakesan  (1969).  The  values  of  c7  can  be  obtained  from  the 
same  tables  because  c7(p,  q,  m,  P‘)  =  c7(p,  m,  q,  P*). 


3.4.  Selection  in  Terms  of  Other  Measures 


Suppose  the  p  variables  under  consideration  are  partitioned  into  two 
sets  consisting  of  91  and  92(91  +92  =  p)  variables.  Let  the  correspond¬ 
ing  partition  of  E,  be  denoted  by 


Selection  in  terms  of  the  conditional  generalized  variance  of  the  9 2-set 
given  the  91-set  has  been  considered  by  Gupta  and  Panchapakesan 

(1969).  Frischtak  (1973)  discussed  selection  in  terms  -y?  =  — JJ2, 1 

lEll  HE22  I 

but  has  obtained  only  an  asymptotic  solution. 

For  the  problem  of  selecting  populations  that  are  better  than  a 
control,  Krishnaiah  (1967)  used  linear  combinations  of  the  elements 
of  the  covariance  matrices  for  making  comparisons.  Krishnaiah  and 
Rizvi  (1966)  used  several  linear  combinations  of  the  elements  of  the 
mean  vectors  for  comparison  and  studied  procedures  to  select  a  subset 
containing  good  populations  (defined  through  comparison  with  the 
control).  For  more  details,  reference  can  also  be  made  to  Gupta  and 
Panchapakesan  (1979). 


4.  SELECTION  FROM  A  MULTINOMIAL  POPULATION 

Let  Pi, . . .  ,Pk  denote  the  unknown  cell  probabilities  of  a  fc-cell  multi- 
'.omial  distribution.  The  ordered  cell  probabilities  are  denoted  by 
Pjij  <  ...  <  p|*|.  Gupta  and  Nagel  (1967)  proposed  and  studied 
procedures  for  selecting  the  most  (least)  probable  cell  based  on  a  sin¬ 
gle  sample  of  size  n.  Let  X\ ,...,Xk  denote  the  cell  counts.  Their 
procedure  for  selecting  the  most  probable  cell  is 

R%  :  Select  the  ith  cell  if  and  only  if  X{  >  X|fc]  -  D  (26) 

and  the  procedure  for  selecting  the  least  probable  cell  is 

R'9  :  Select  the  ith  cell  if  and  only  if  Xi  <  X[i]  +  C  (27) 

where  D  —  D(k,n,P *)  and  C  =  C(k,n,P*)  are  the  smallest  nonneg¬ 
ative  integers  for  which  the  P*-condition  is  satisfied  in  each  case. 

An  intereting  point  about  R9  and  R9  is  that,  unlike  similar 


analogous  rules  for  normal  means,  normal  variances,  etc.,  the  analyses 
in  the  maximum  and  minimum  cases  do  not  run  parallel.  The  LFC  for 
either  procedure  is  completely  known  only  when  k  =  2.  In  this  case,  it 
is  given  by  pi  =  P2  =  For  k  >  2,  the  LFC  (in  terms  of  the  ordered 
Pi)  is  of  the  type  (0, . . .  ,0,  s,  p, . . .  ,p),  s  <  p,  in  the  case  of  R$  and 
is  of  the  type  (p,...,p,g),  p  <  q,  in  the  case  of  R8.  An  alternative 
to  Rs  is  the  inverse  sampling  selection  rule  of  Panchapakesan  (1971, 
1973) .  Observations  are  made  one  at  a  time  until  the  cell  count  reaches 
a  predetermined  integer  M  in  one  of  the  cells.  At  termination,  let 
Xi, . . . ,  Xk  be  the  cell  counts  (one  of  them  is  M).  The  selection  rule 
is 

Rq  :  Select  the  ith  cell  if  and  only  if  X,  >  M  —  D  (28) 

where  D( 0  <  D  <  M)  is  the  smallest  nonnegative  integer  for  which 
the  P*-condition  is  satisfied.  For  Rg,  the  infimum  of  PCS  occurs  when 
all  the  cell  probabilities  are  equal. 

Again,  for  selecting  the  most  probable  cell,  Gupta  and  Huang 
(1975)  proposed  the  rule 

Rio  :  Select  the  ith  cell  if  and  only  if  X,  +  1  >  cX[k]  (29) 

where  c  =  c{k,  N,  P*)e (0, 1)  is  the  largest  number  for  which  the  P*- 
condition  is  met.  The  motivation  for  the  rule  Ri0  comes  from  their 
conditional  selection  rules  for  Poisson  populations.  A  conservative 
value  of  c  can  be  obtained  from  their  results  for  Poisson  populations. 

Recently,  Chen  (1985)  considered  an  inverse  sampling  selection 
rule  for  selecting  a  subset  containing  the  least  probable  cell.  For  his 
procedure  Rn  the  observations  are  made  one  at  a  time  until  either 
(1)  the  count  in  any  cell  reaches  r,  or  (2)  ( k  —  1)  cells  reach  count 
of  at  least  r'(l  <  r'  <  r  +  1).  If  (l)  occurs  before  (2),  the  rule  Rn 
selects  the  cells  with  counts  X{  <  r' .  If  (2)  occurs  before  (l),  then 
R\i  selects  the  cell  with  count  X,  <  r' .  The  constants  r  and  r'  are 
to  be  chosen  so  as  to  satisfy  the  P'-condition.  It  has  been  shown  by 
Chen  (1985)  that  the  infimum  of  P(CS|Pn)  occurs  when  all  the  cell 
probabilities  are  equal. 

Minimax  subset  selection  rules  have  been  investigated  by  Berger 
(1979)  and  Berger  and  Gupta  (1980).  For  selecting  the  least  probable 
cell,  Berger  (1980)  investigated  a  minimax  subset  selection  rule  taking 
as  loss  the  size  of  the  selected  subset  or  the  number  of  non-best  cells 
selected.  In  another  paper,  Berger  (1982)  investigated  minimax  and 
admissible  subset  selection  rules  for  the  least  probable  cell  taking  as 
the  loss  the  number  of  non-best  cells  selected.  His  rule,  however, 
satisfies  the  P*-condition  only  if  P*  is  sufficiently  large.  For  the 


corresponding  procedure  for  the  most  probable  cell,  the  P*-condition 
has  been  verified  only  in  certain  special  cases. 

The  importance  of  multinomial  selection  rules  is  accented  by 
the  fact  that  they  provide  distribution-free  procedures.  Suppose  that 
7r i, . . . , TTfc  have  continuous  distributions  F$i ,  i  =  1, . . . , k.  We  assume 
that  {Fg}  is  a  stochastically  increasing  family  in  0.  Let  px  denote  the 
probability  that  in  a  set  of  k  observations,  one  from  each  distribu¬ 
tion,  the  observation  from  7r,-  is  the  largest,  i  =  1  ,...,k.  Selecting 
the  stochastically  largest  (smallest)  population  is  then  equivalent  to 
selecting  the  population  associated  with  the  largest  (smallest)  p,.  If 
we  take  observations  a  vector  at  a  time  and  note  which  population 
yielded  the  largest  observation,  the  problem  can  be  converted  to  the 
multinomial  cell  problem. 


5.  SELECTION  FROM  SEVERAL  MULTINOMIAL 
POPULATIONS 

Let  7rx , . . .  ,?Tfc  be  A:  multinomial  populations  each  with  m  cells  and  let 
the  unknown  cell  probabilities  of  ir i  be  p»i, . . . ,  pim;  t  =  1, . . . ,  k.  Let 

m 

Hx  =  H(pn, . . .  ,p,m)  =  -  P»jl°8  P»/>  the  Shannon  entropy  func- 

j  =  i 

tion  associated  with  7r,.  The  function  is  a  measure  of  the  uncertainty 
with  regard  to  the  nature  of  the  outcomes  from  nx.  We  want  to  se¬ 
lect  the  population  associated  with  the  largest  Hx.  For  m  —  2,  the 
problem  reduces  to  that  of  selecting  the  binomial  population  associ¬ 
ated  with  the  largest  V>(0»)  —  —  0,-log  0,  -  (1  -  0,)log  (1  -  0,),  where 
0,  is  the  success  probability.  In  this  case,  Gupta  and  Huang  (1976) 
proposed  the  rule 

X  X 

R\2  :  Select  7r,  if  and  only  if  — ^-)  >  max  0( — ~)  —  d j2  (30) 

n  ri 

where  X,  is  the  number  of  successes  in  n  trials  associated  with  7r,, 
anddj2  d12(k,  n,  P* )  is  the  smallest  nonnegative  constant  such  that 
0  <  d  <  0([n/2]/n)  for  which  the  P*-condition  is  satisfied.  Here  [n/2] 
denotes  the  largest  integer  <  n/2.  The  infimum  of  P{CS\Ri2)  takes 
place  when  0j  =  —  6^  —  6.  However,  the  common  value  0  for  which 

the  infimum  takes  place  is  not  known.  Gupta  and  Huang  (1976)  have 
obtained  a  conservative  value  of  d  using  the  approach  of  Gupta,  Huang 
and  Huang  (1975),  who  used  this  approach  to  obtain  a  conservative 
value  for  the  constant  defining  the  procedure  of  Gupta  and  Sobel 


(1960)  for  selecting  the  binomial  population  with  the  largest  success 
probability.  For  more  details  on  this,  see  Gupta  and  Panchapakesan 
(1979, 1985). 

To  discuss  the  selection  procedure  of  Gupta  and  Wong  (1977)  in 

m 

the  case  of  m  >  2,  let  a  =  (aj , . . . ,  om)  and  Ar  =  5Z  a[»|  ,  where  fl[i)  ^ 

i—r 

...  <  ajm]  are  the  ordered  components.  Vector  a  =  (ai,...,am) 
is  said  to  majorize  vector  6  =  (bi, . . .  ,bm)  of  the  same  dimension 
(written  a  >  b)  if  Ar  >  Br  for  r  =  2, . . . ,  m,  and  Ai  —  B\.  Further, 
a  function  /  is  said  to  be  Schur-concave  if  /(x)  <  f(x')  whenever 
x  >-  x' . 

In  our  selection  problem,  we  assume  that  there  is  a  population 
whose  associated  vector  of  cell  probabilities  is  majorized  by  the  as¬ 
sociated  vector  of  cell  probabilities  of  any  other  population.  Such  a 
population  will  have  the  largest  Hi  because  the  entropy  function  is 
Schur-concave.  Let  y?t  =  where  <p  is  a  Schur-concave 

function,  and  Af,i, . . . ,  Xtm  are  the  cell  counts  based  on  n  independent 
observations  from  tt,,  t  =  1,. . .  ,k.  Gupta  and  Wong  (1977)  proposed 
the  rule 

Riz  :  Select  7r,  if  and  only  if  <Pi  >  max^ipj  -  dX3  (31) 

where  d13  =  di3(k,m,n,P*)  is  the  smallest  positive  constant  for 
which  the  P ‘-condition  is  satisfied.  Gupta  and  Wong  obtained  a 
conservative  value  of  d  using  the  idea  of  conditioning  as  in  the  paper 
of  Gupta  and  Huang  (1976). 


6.  SELECTION  OF  VARIABLES  IN  LINEAR  REGRESSION 

In  applying  regression  analysis  in  practical  situations  for  prediction 
purposes,  we  are  often  faced  with  a  large  number  of  independent  vari¬ 
ables.  In  such  situations,  it  may  be  sufficient  to  consider  a  subset  of 
these  predictor  variables  for  “adequate”  prediction.  There  arises  then 
a  problem  of  choosing  a  “good”  subset  of  these  variables.  Hocking 
(1976)  and  Thompson  (1978a, b)  have  reviewed  several  criteria  and 
techniques  that  have  been  used  in  practice.  However,  these  are  ad 
hoc  procedures  and  are  not  designed  to  control  the  probability  of  se¬ 
lecting  the  important  variables.  McCabe  and  Arvesen  (1974),  and 
Arvesen  and  McCabe  (1975)  were  the  first  to  formulate  this  problem 
in  the  framework  of  Gupta-type  subset  selection. 


Consider  the  standard  linear  model 


Y  =  X(3  +  e  (32) 

where  X  is  an  N  x  p  known  matrix  of  rank  p  <  JV, £  is  a  px  1  parameter 
vector,  and  £  ~  N (0,<72  If/)-  This  model  with  p  independent  variables 
is  considered  as  the  “true”  model.  Now,  consider  all  reduced  models 
that  are  formed  by  taking  all  possible  subsets  of  size  t(<  p)  from  the 
p  independent  variables.  These  models  are  described  by 

Y  =  XiQi  +  £i,  t  =  1 . *=(?)’  (33) 

where  X,  is  an  N  x  t  matrix  (of  rank  t),  is  a  t  x  1  parameter 
vector,  and  e,  ~  N(0,<t2In).  It  should  be  noted  that  the  models 
in  (33)  are  considered  for  prediction  purposes  and  must  be  compared 
under  the  true  model  assumptions.  The  expectations  of  residual  mean 
squares  in  the  corresponding  ANOVA  evaluated  under  the  true  model 
assumption  are  a2,  i  =  For  the  goal  of  selecting  the  design 

X,  (or  the  corresponding  set  of  independent  variables)  associated  with 
<7|2t|,  Arvesen  and  McCabe  (1975)  proposed  the  rule 

:  Select  the  design  X,  if  and  only  if  55,  <  -^-55[1|  (34) 

C14 

where  55,  is  the  residual  sum  of  squares  in  the  ANOVA  corresponding 
to  the  design  X,,  and  0  <  Cj4  =  Ci4(p,  t,  N,  P*)  <  1  is  to  be  chosen  to 
satisfy  the  P* -condition.  An  exact  evaluation  of  the  constant  ci4  is 
difficult.  Arvesen  and  McCabe  showed  that  the  PCS  is  asymptotically 
( N  — »  oo)  minimized  when  §  ~  0.  The  evaluation  of  cj4  is  not  easy 
even  under  this  asymptotic  LFC.  An  algorithm  has  been  given  by 
McCabe  and  Arvesen  (1974)  for  determining  Ci4  under  the  asymptotic 
LFC  for  given  P*  and  X,  using  Monte  Carlo  methods. 

In  the  above  formulation,  the  size  t  is  arbitrarily  fixed.  Huang 
and  Panchapakesan  (1982)  considered  a  different  formulation  taking 
into  consideration  all  possible  reduced  models.  They  considered  the 
regression  model  with  §'  =  [flo  >•••>/?*>)>  and  X  =  (lii . . .  ip_i), 
where  1'  =  (l,...,l)  and  x'  =  (x.i,...,*,^),  »  =  l,...,p-  1.  For 
fixed  ac{0,  l,...,p  -  1},  consider  all  the  (p”1)  subsets  of  the  set 
of  predictor  variables  {xj, . . .  ,xp_i}  and  the  corresponding  reduced 
models  obtained  from  (32).  Associated  with  these  reduced  models 
are  the  multiple  correlation  coefficients  i?,a,  i  =  1,2,...,  (pat  1 ) -  Let 


Oii0,  =  E(l  -  R?a).  Any  reduced  model  with  the  associated  parame¬ 
ter  0i  a  is  said  to  be  inferior  if  0lp_i  <  where  6*c(0, 1)  is  a 

specified  constant.  (The  parameter  is  associated  with  the  true 

model).  Huang  and  Panchapakesan  (1982)  considered  the  problem  of 
eliminating  all  inferior  models.  A  correct  decision  (CD)  is  selection 
of  any  subset  of  the  models  such  that  all  inferior  models  are  excluded 
from  the  selected  subset.  They  proposed  and  studied  the  procedure 

R 15  :  Exclude  a  model  if  and  only  if  0,iQ  >  — — 6 i]P_i  (35) 

A 

where  0,  Q  =  1  -  J2?  ,  and  the  constant  C15  =  cx5(N,p,  P*)  >  6 *  is 
determined  such  that  the  P*-condition  is  satisfied. 

The  LFC  for  the  rule  P15  has  been  established  only  in  the 
asymptotic  ( N  — ►  00)  sense.  For  evaluating  the  constant  under  the 
asymptotic  LFC($  =  0),  Huang  and  Panchapakesan  (1982)  used  an 
algorithm  similar  to  that  of  McCabe  and  Arvesen  (1974). 

Hsu  and  Huang  (1982)  considered  the  goal  of  selecting  a  subset 
of  the  models  that  contains  all  the  superior  models,  namely,  all  models 
for  which  of  <  Act2,  where  A  >  1  is  a  specified  constant.  For  this 
problem,  they  investigated  a  sequential  procedure. 

Gupta,  Huang  and  Chang  (1984)  studied  the  problem  of  elimi¬ 
nating  inferior  models,  using  the  expected  mean  squares  as  the  crite¬ 
rion  for  comparing  any  model  with  the  true  model.  Their  approach  is 
different  from  those  of  the  earlier  papers  in  that  they  use  simultaneous 
tests  of  a  family  of  hypotheses  in  constructing  their  procedure. 

Now,  for  any  reduced  model,  it  is  known  that  55,/ct2  has  (under 
the  full  assumption  model)  a  noncentral  chi-square  distribution  with 
u  —  N  -  p  -t-  1  degrees  of  freedom  and  a  noncentrality  parameter 
A,  =  (X3)'Qt(X/3)/2ol,  where  Q,  =  IN  -  Xi(X[Xi)~l X[,  and  ct2 
is  the  error  variance  in  the  full  model.  Recently,  Gupta  and  Huang 
(1986)  have  considered  the  problem  of  eliminating  inferior  models, 
namely,  those  for  which  A,  >  A  >  0,  where  A  is  specified  in  advance. 
For  this  problem,  they  have  proposed  and  investigated  a  two-stage 
procedure. 


7.  CONCLUSION 


As  we  have  seen,  multivariate  selection  problems  have  wider  appli¬ 
cations.  However,  in  many  cases,  the  existing  procedures  have  not 
been  fully  examined  in  terms  of  their  performances  as  well  as  the  de¬ 
termination  of  the  LFC.  Even  the  multinomial  problems  have  to  be 
studied  more  satisfactorily.  Also,  the  criterion  employed  for  ranking 
multivariate  populations  usually  induce  a  complete  ordering  in  the 
space  of  distributions.  However,  in  many  practical  problems,  there 
is  a  need  to  consider  a  partial  ordering.  There  has  been  practically 
no  development  in  this  direction.  Also,  there  has  been  no  work  done 
for  distributions  other  than  multivariate  normal  populations.  It  will 
be  interesting  to  consider  reliability  related  models  such  as  increasing 
failure  rate  distributions  in  two  or  more  dimensions. 
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